Univariate - MA Data Analysis
Univariable —
Open days
## obs_days open_days closed_days
## 1 169 8 161
## # A tibble: 2 × 3
## is_closed n prop
## <lgl> <int> <dbl>
## 1 FALSE 161 95.3
## 2 TRUE 8 4.7
| is_closed | N | Percent |
|---|---|---|
| FALSE | 161 | 95.27 |
| TRUE | 8 | 4.73 |
| All | 169 | 100.00 |
## Warning: To compile a LaTeX document with this table, the following commands must be placed in the document preamble:
##
## \usepackage{booktabs}
## \usepackage{siunitx}
## \newcolumntype{d}{S[
## input-open-uncertainty=,
## input-close-uncertainty=,
## parse-numbers = false,
## table-align-text-pre=false,
## table-align-text-post=false
## ]}
##
## To disable `siunitx` and prevent `modelsummary` from wrapping numeric entries in `\num{}`, call:
##
## options("modelsummary_format_numeric_latex" = "plain")
## This warning appears once per session.
##
## Attaching package: 'parameters'
## The following object is masked from 'package:modelsummary':
##
## supported_models
## Skewness | SE
## ----------------
## 0.726 | 0.185
## Skewness | SE
## ----------------
## 1.327 | 0.185
## Skewness | SE
## ----------------
## 0.669 | 0.185
Basic Summary of Dependent Variables
## # A tibble: 4 × 13
## variable n min max median q1 q3 iqr mad mean sd se
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 food_loss_… 161 0 13.8 7.35 6.7 8.4 1.7 1.11 7.83 2.17 0.171
## 2 food_waste… 161 0 6.55 2.1 1.1 2.95 1.85 1.33 2.19 1.40 0.111
## 3 liquid_was… 161 0 4.5 1.5 0.65 2.05 1.4 1.04 1.48 0.995 0.078
## 4 solid_wast… 161 0 2.95 0.65 0.35 0.95 0.6 0.445 0.708 0.499 0.039
## # ℹ 1 more variable: ci <dbl>
Histograms —
X Histogram with density
## Saving 8 x 5 in image
#### Q-Q plot
## Saving 8 x 5 in image
X shapiro test
## # A tibble: 3 × 3
## variable statistic p
## <chr> <dbl> <dbl>
## 1 food_waste_kg 0.952 0.0000260
## 2 liquid_waste_kg 0.951 0.0000192
## 3 solid_waste_kg 0.903 0.00000000783
From the output, all the p-value is far less than 0.05; so implying that the distribution of the data are significantly different from normal distribution. In other words, we can not assume the normality.
Histogram Food Waste per customer
Q-Q plot Food Waste per customer
shapiro test for per customer
## # A tibble: 3 × 3
## variable statistic p
## <chr> <dbl> <dbl>
## 1 food_waste_p_kg 0.987 1.38e- 1
## 2 liquid_waste_p_kg 0.984 6.10e- 2
## 3 solid_waste_p_kg 0.863 6.24e-11
From the output, the p-value of solid food waste per customer is far less that the significant level of 0.05; but the others are not. So it imply that the distribution of the data for solid food waste per customer is significantly different from normal distribution. In other words, we can assume the normality for food waste and liquid food waste per customer but not for solid food waste.
Histogram logged Food Waste
Q-Q plot logged Food Waste
shapiro test for per customer
## # A tibble: 3 × 3
## variable statistic p
## <chr> <dbl> <dbl>
## 1 log_food_waste_kg 0.979 0.0153
## 2 log_liquid_waste_kg 0.972 0.00208
## 3 log_solid_waste_kg 0.979 0.0166
Time Series Plots —
Daily Time Series
## Saving 8 x 5 in image
Daily plot per customer
Decompsiotion
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2) with non-zero mean : 595.2761
## ARIMA(0,0,0) with non-zero mean : 607.2775
## ARIMA(1,0,0) with non-zero mean : 598.3493
## ARIMA(0,0,1) with non-zero mean : 606.2906
## ARIMA(0,0,0) with zero mean : 795.7987
## ARIMA(1,0,2) with non-zero mean : 593.7226
## ARIMA(0,0,2) with non-zero mean : 603.5818
## ARIMA(1,0,1) with non-zero mean : 598.3892
## ARIMA(1,0,3) with non-zero mean : 594.7845
## ARIMA(0,0,3) with non-zero mean : 602.7266
## ARIMA(2,0,1) with non-zero mean : 593.1346
## ARIMA(2,0,0) with non-zero mean : 593.03
## ARIMA(3,0,0) with non-zero mean : 591.0829
## ARIMA(4,0,0) with non-zero mean : 593.9004
## ARIMA(3,0,1) with non-zero mean : 593.1032
## ARIMA(4,0,1) with non-zero mean : 594.6705
## ARIMA(3,0,0) with zero mean : 655.5828
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(3,0,0) with non-zero mean : 600.6932
##
## Best model: ARIMA(3,0,0) with non-zero mean
## Series: df$food_waste_kg
## ARIMA(3,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 mean
## 0.1053 -0.2083 -0.1262 2.0746
## s.e. 0.0788 0.0769 0.0786 0.0871
##
## sigma^2 = 1.97: log likelihood = -295.16
## AIC=600.33 AICc=600.69 BIC=615.97
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2) with non-zero mean : 242.2204
## ARIMA(0,0,0) with non-zero mean : 254.9591
## ARIMA(1,0,0) with non-zero mean : 242.9804
## ARIMA(0,0,1) with non-zero mean : 254.9337
## ARIMA(0,0,0) with zero mean : 424.4576
## ARIMA(1,0,2) with non-zero mean : 240.5345
## ARIMA(0,0,2) with non-zero mean : 253.0456
## ARIMA(1,0,1) with non-zero mean : 242.4608
## ARIMA(1,0,3) with non-zero mean : 241.1252
## ARIMA(0,0,3) with non-zero mean : 252.9766
## ARIMA(2,0,1) with non-zero mean : 240.7382
## ARIMA(2,0,3) with non-zero mean : 243.1306
## ARIMA(1,0,2) with zero mean : 290.294
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(1,0,2) with non-zero mean : 252.8433
##
## Best model: ARIMA(1,0,2) with non-zero mean
## Series: df$solid_waste_kg
## ARIMA(1,0,2) with non-zero mean
##
## Coefficients:
## ar1 ma1 ma2 mean
## 0.3933 -0.3011 -0.2195 0.6723
## s.e. 0.2334 0.2269 0.0728 0.0303
##
## sigma^2 = 0.2516: log likelihood = -121.24
## AIC=252.48 AICc=252.84 BIC=268.12
##
## Fitting models using approximations to speed things up...
##
## ARIMA(2,0,2) with non-zero mean : 481.848
## ARIMA(0,0,0) with non-zero mean : 489.7931
## ARIMA(1,0,0) with non-zero mean : 483.6428
## ARIMA(0,0,1) with non-zero mean : 488.6056
## ARIMA(0,0,0) with zero mean : 668.5145
## ARIMA(1,0,2) with non-zero mean : 481.4292
## ARIMA(0,0,2) with non-zero mean : 487.558
## ARIMA(1,0,1) with non-zero mean : 484.5832
## ARIMA(1,0,3) with non-zero mean : 482.8695
## ARIMA(0,0,3) with non-zero mean : 487.0004
## ARIMA(2,0,1) with non-zero mean : 480.5155
## ARIMA(2,0,0) with non-zero mean : 480.0232
## ARIMA(3,0,0) with non-zero mean : 478.3711
## ARIMA(4,0,0) with non-zero mean : 480.7297
## ARIMA(3,0,1) with non-zero mean : 480.1401
## ARIMA(4,0,1) with non-zero mean : 479.0072
## ARIMA(3,0,0) with zero mean : 539.5893
##
## Now re-fitting the best model(s) without approximations...
##
## ARIMA(3,0,0) with non-zero mean : 484.9027
##
## Best model: ARIMA(3,0,0) with non-zero mean
## Series: df$liquid_waste_kg
## ARIMA(3,0,0) with non-zero mean
##
## Coefficients:
## ar1 ar2 ar3 mean
## 0.1128 -0.1804 -0.124 1.4030
## s.e. 0.0780 0.0767 0.078 0.0638
##
## sigma^2 = 0.9932: log likelihood = -237.27
## AIC=484.53 AICc=484.9 BIC=500.18
Boxplots - weekly
Boxplots per customer - weekly
bar plot - weekly
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning in stat_summary(fun = mean, geom = "bar", shape = 16, size = 3): Ignoring unknown parameters: `shape`
## Ignoring unknown parameters: `shape`
## Ignoring unknown parameters: `shape`
Boxplot - monthly
## Boxplot per customer - monthly
Time Series Plots for Independents
(Partial and) Autocorrelation Function
Spectral Analysis
## [1] 3.214286
## [1] 5.294118
## [1] 5.142857
## [1] 5.294118
roughly 6 (days) period for food waste, but food loss is approx. 3 days
or 20 days cycle.